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Abstract.  Tracking  deforming  objects  involves  estimating  the  global 
motion  of  the  object  and  its  local  deformations  as  functions  of  time. 
Tracking  algorithms  using  Kalman  filters  or  particle  filters  have  been 
proposed  for  tracking  such  objects,  but  these  have  limitations  due  to  the 
lack  of  dynamic  shape  information.  In  this  paper,  we  propose  a  novel 
method  based  on  employing  a  locally  linear  embedding  in  order  to  incor¬ 
porate  dynamic  shape  information  into  the  particle  filtering  framework 
for  tracking  highly  deformable  objects  in  the  presence  of  noise  and  clut¬ 
ter. 


1  Introduction 

The  problem  of  tracking  moving  and  deforming  objects  has  been  a  topic  of  sub¬ 
stantial  research  in  the  field  of  active  vision;  see  [1,  2]  and  the  references  therein. 
There  is  also  an  extensive  literature  with  various  proposals  for  tracking  objects 
with  static  shape  prior  [3].  This  paper  proposes  a  novel  method  to  incorporate 
dynamic  shape  priors  into  the  particle  filtering  framework  for  tracking  highly 
deformable  objects  in  the  presence  of  noise  and  clutter. 

In  order  to  appreciate  this  methodology,  we  briefly  review  some  previous  re¬ 
lated  work.  The  possible  parameterizations  of  planar  shapes  described  as  closed 
contours  are  of  course  very  important.  Various  finite  dimensional  parameteriza¬ 
tions  of  continuous  curves  have  been  proposed,  perhaps  most  prominently  the 
B-spline  representation  used  for  a  “snake  model”  as  in  [2].  Isard  and  Blake  (see 
[1]  and  the  references  therein)  use  the  B-spline  representation  for  contours  of 
objects  and  propose  the  CONDENSATION  algorithm  [1]  which  treats  the  affine 
group  parameters  as  the  state  vector,  learns  a  prior  dynamical  model  for  them, 
and  uses  a  particle  filter  [4]  to  estimate  them  from  the  (possibly)  noisy  observa¬ 
tions.  Since  this  approach  only  tracks  affine  parameters,  it  cannot  handle  local 
deformations  of  the  deforming  object. 

Another  approach  for  representing  contours  is  via  the  level  set  method  [5, 
6]  where  the  contour  is  represented  as  the  zero  level  set  of  a  higher  dimensional 
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function,  usually  the  signed  distance  function  [5].  For  segmenting  an  object,  an 
initial  guess  of  the  contour  (represented  using  the  level  set  function)  is  deformed 
until  it  minimizes  an  image-based  energy  functional.  Some  previous  work  on 
tracking  using  level  set  methods  is  given  in  [3,  7-10]. 

Shape  information  is  quite  useful  when  tracking  in  clutter,  especially  if  the 
object  to  be  tracked  gets  occluded.  Hence,  a  number  of  methods  have  been  pro¬ 
posed  [3]  which  incorporate  a  static  shape  prior  into  the  tracking  framework.  The 
approach  of  these  works  is  based  on  the  idea  that  the  object  being  tracked  does 
not  undergo  a  deformation  (modulo  a  rigid  transformation).  Another  method 
to  obtain  a  shape  prior  is  using  PC  A  (principal  component  analysis)  [11].  In 
this  case,  it  is  assumed  that  the  shape  can  undergo  small  variations  which  can 
be  captured  by  doing  linear  PCA.  However,  linear  PCA  is  quite  inadequate  in 
representing  the  shape  variations  if  the  object  being  tracked  undergoes  large 
deformations  (as  will  be  explained  in  detail  in  the  subsequent  sections). 

The  authors  in  [12]  use  a  particle  filtering  algorithm  for  geometric  active  con¬ 
tours  to  track  highly  deformable  objects.  The  tracker  however  fails  to  maintain 
the  shape  of  the  object  being  tracked  in  case  of  occlusion.  The  present  work 
extends  the  method  proposed  in  [12]  by  incorporating  dynamic  shape  priors  into 
the  particle  filtering  framework  based  on  the  use  of  a  Locally  Linear  Embedding 
(LLE).  LLE  [13, 14]  attempts  to  discover  the  nonlinear  structure  in  high  dimen¬ 
sional  data  by  exploiting  the  local  symmetries  of  linear  reconstructions.  To  the 
best  of  our  knowledge,  this  is  the  first  time  LLE  has  been  used  for  shape  analy¬ 
sis  and  tracking.  Another  approach  closely  related  to  our  work  was  proposed  in 
[15],  wherein  exemplars  were  used  to  learn  the  distribution  of  possible  shapes. 
A  different  method  in  [16]  separates  the  space  of  possible  shapes  into  different 
clusters  and  learns  a  transition  matrix  to  transition  from  one  patch  of  shapes 
to  the  next.  Our  approach  is  different  from  those  in  [15, 16]  in  that  we  do  not 
learn  the  dynamics  of  shape  variation  apriori.  The  only  knowledge  required  in 
our  method  is  a  possible  set  of  shapes  of  the  deforming  object. 

The  literature  reviewed  above  is  by  no  means  exhaustive.  Due  to  paucity  of 
space  we  have  only  quoted  a  few  related  works.  The  rest  of  the  paper  is  organized 
as  follows:  Section  2  gives  the  motivation  and  briefly  describes  the  concepts  of 
LLE,  shape  similarity  measures,  and  curve  evolution.  Section  3  develops  the 
state  space  model  in  detail  and  Section  4  describes  the  experiments  conducted 
to  test  the  proposed  method.  Some  conclusions  and  further  research  directions 
are  discussed  in  Section  5. 

2  Preliminaries 

Principal  component  analysis  (PCA)  is  one  of  the  most  popular  forms  of  dimen¬ 
sionality  reduction  techniques.  In  PCA,  one  computes  the  linear  projections  of 
greatest  variance  from  the  top  eigenvectors  of  the  data  covariance  matrix.  Its  first 
application  to  shape  analysis  [11]  in  the  level  set  framework  was  accomplished 
by  embedding  a  curve  C  as  the  zero  level  set  of  a  signed  distance  function  By 
doing  this,  a  small  set  of  coefficients  can  be  utilized  for  a  shape  prior  in  various 
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segmentation  tasks  as  shown  in  [11, 17].  However,  linear  PCA  assumes  that  any 
required  shape  can  be  represented  using  a  linear  combination  of  eigen-shapes, 
i.e.,  any  new  shape  3  can  be  obtained  by  [17],  ^  where  Wi  are 

weights  assigned  to  each  eigenshape  <Pi  and  i>  is  the  mean  shape.  Thus,  PCA 
assumes  that  the  set  of  training  shapes  lie  on  a  linear  manifold. 

More  specifically,  let  us  consider  shapes  of  certain  objects  with  large  defor¬ 
mations,  for  example,  Figure  1  shows  a  set  of  few  shapes  of  a  man.  PCA  was 
performed  on  75  such  shapes  (embedded  in  a  signed  distance  function).  Figure  2 
shows  the  original  and  the  reconstructed  shape.  Thus,  linear  PCA  cannot  be 
used  to  obtain  a  shape  prior  if  the  training  set  lies  on  a  non-linear  manifold. 


Fig.  1.  Few  shapes  of  a  man  from  a  training  set.  Note  the  large  deformation  in  shape. 


Fig.  2.  Left:  Original  shape,  Middle:  projection  in  the  PCA  basis,  Right:  LLE  (2  nearest 
neighbors) . 

In  [18],  the  authors  proposed  an  unsupervised  Locally  Linear  Embedding 
(LLE)  algorithm  that  computes  low  dimensional,  neighborhood  preserving  em¬ 
beddings  of  high  dimensional  data.  LLE  attempts  to  discover  nonlinear  structure 
in  high  dimensional  data  by  exploiting  the  local  symmetries  of  linear  combina¬ 
tions.  It  has  been  used  in  many  pattern  recognition  problems  for  classification. 
In  this  work,  we  use  it  in  the  particle  filtering  framework  for  providing  dynamic 
shape  prior. 


2.1  Locally  Linear  Embedding  for  Shape  Analysis 

The  LLE  algorithm  [14]  is  based  on  certain  simple  geometric  intuitions.  Suppose 
the  data  consists  of  N  vectors  <&i  sampled  from  some  smooth  underlying  mani¬ 
fold.  Provided  there  is  sufficient  data,  we  expect  each  data  point  and  its  neighbors 
to  lie  on  or  close  to  a  locally  linear  patch  of  the  manifold.  We  can  characterize 
the  local  geometry  of  these  patches  by  linear  coefficients  that  reconstruct  each 
data  point  from  its  neighbors.  In  the  simplest  formulation  of  LLE,  one  identifies 
k  nearest  neighbors  for  each  data  point.  Reconstruction  error  is  then  measured 
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by  the  cost  function:  E(W )  =  \P  —  WjPj j  .  We  seek  to  minimize  the  re¬ 
construction  error  E(W ),  subject  to  the  constraint  that  the  weights  Wj  that  he 
outside  the  neighborhood  are  zero  and  JT  wj  =  1.  With  these  constraints,  the 
weights  for  points  in  the  neighborhood  of  P  can  obtained  as  [13]: 

Wj  =  pTO"1fcR,7m — ,  where  Qjm  =  ($  —  d>j)T  (d?  —  <£m),  R  =  Q~l  (1) 

2^p=l  2^q=  1  Hpq 

In  this  work,  we  assume  that  a  closed  curve  Cj  is  represented  as  the  zero  level 
set  of  a  signed  distance  function  Pj .  Stacking  all  the  columns  of  Pj  one  below  the 
other,  one  can  obtain  a  vector  of  dimension  D2 ,  if  Pi  is  of  dimension  D  x  D.  (In 
the  rest  of  the  paper,  we  use  P  interchangeably  to  represent  a  vector  of  dimension 
D 2  or  a  matrix  of  dimension  D  x  D.  The  appropriate  dimension  can  be  inferred 
from  the  context.)  Figure  2  shows  a  particular  shape  being  represented  by  2  of 
its  nearest  neighbors. 


2.2  Finding  the  Nearest  Neighbors 

The  previous  section  showed  how  to  represent  a  shape  Pi  by  a  linear  combination 
of  its  k  neighbors.  Here  we  consider  the  key  issue  of  how  to  find  the  nearest 
neighbors.  One  might  be  tempted  to  use  the  Euclidean  2-norm  to  find  distance 
between  shapes,  i.e.,  if  d2(P>i,Pj)  is  the  (squared)  distance  between  Pi  and  Pj, 
then  d2(<Pi,<Pj)  =||  Pi  —  Pj  ||2.  However,  this  norm  does  not  represent  distance 
between  shapes,  but  only  distance  between  two  vectors.  Since  we  are  looking  for 
the  nearest  neighbors  of  Ci  in  the  shape  space,  a  similarity  measure  between 
shapes  is  a  more  appropriate  choice.  Many  measures  of  similarity  have  been 
reported;  see  [19,3,20].  In  this  paper,  we  have  chosen  the  following  distance 
measure  [21]: 


d2{$i,$j)=  [  EDT*j(p)dp+  f  EDT^{p)dp  (2) 

Jpez(&i)  Jpez(<Pj) 

where,  EDT&.  is  the  Euclidean  distance  function  of  the  zero  level  set  of  <P>i  (one 
can  think  of  it  as  the  absolute  value  of  ^),  and  Z(d?i)  is  the  zero  level  set  of 
d?i.  We  chose  this  particular  distance  measure  because  it  allows  for  partial  shape 
matching  which  is  quite  useful  for  occlusion  handling.  More  details  about  this 
measure  may  be  found  in  [21].  We  should  note  that  the  development  of  the 
remaining  algorithm  does  not  depend  on  the  choice  of  the  distance  measure. 
Thus,  once  the  distance  measure  between  each  <P>i  and  the  rest  of  the  elements 
in  the  training  set  is  known,  one  can  find  the  nearest  neighbors  of 


2.3  Curve  Evolution 

There  is  a  large  literature  concerning  the  problem  of  separating  an  object  from 
its  background  [3,9].  Level  sets  have  been  used  quite  successfully  for  this  task. 
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In  [22],  the  authors  have  proposed  a  variational  framework  for  segmenting  an 
object  using  the  first  two  moments  (mean  and  variance)  of  image  intensities.  In 
the  present  work,  we  have  used  the  energy  functional  given  in  [22] 


Eimaae=  /  (log a2u  +  H($)dx 

J  f2  \  J 

+  f  (log a2v  +  j  (1  _  Hm  dx  +  p  f  II  VH(0)  ||da:, 

J  J?  \  ®  v  J  J  Q 

which  upon  minimization  gives  the  following  PDE: 


d<P 

~dt 


II  II 


+  log^ 


(I(»  -m)2  (I(x)  -u)2\ 

^  )' 


(4) 


Here  I(x)  is  the  image,  u,v  are  the  mean  intensities  inside  and  outside  the 
curve  C  (corresponding  to  respectively,  cr^a^  are  the  respective  variances 
and  5e(@)  =  ^  is  the  Dirac  delta  function  and  H  is  the  Heaviside  function  as 
defined  in  [22].  Note  that,  one  could  use  any  type  of  curve  evolution  equation  in 
the  algorithm  being  proposed.  We  have  made  this  particular  choice  because  it  is 
simple  yet  powerful  in  segmenting  cluttered  images. 


3  The  State  Space  Model 

This  section  describes  the  state  space  model,  the  prediction  model,  and  the 
importance  sampling  concept  used  within  the  particle  filtering  framework  for 
tracking  deformable  objects.  We  will  employ  the  basic  theory  of  particle  filtering 
here  as  described  in  [4]. 

Let  St  denote  the  state  vector  at  time  t.  The  state  consists  of  parameters 
T  that  models  the  rigid  (or  affine)  motion  of  the  object  (e.g.,  T  =  [x  y  0]  for 
Euclidean  motion)  and  the  curve  C  (embedded  as  the  zero  level  set  of  which 
models  the  shape  of  the  object,  i.e.,  St  =  [Tt  @t\.  The  observation  is  the  image 
at  time  £,  i.e.,  Yt  =  Image(t).  Our  goal  is  to  recursively  estimate  the  posterior 
distribution  p(St  \  Yi:t)  given  the  prior  p(St-i  |Yi:t-i)-  This  involves  a  time  update 
step  and  a  measurement  update  step  as  described  in  the  next  section. 

In  general,  it  is  quite  difficult  to  obtain  a  model  for  predicting  the  position 
and  shape  of  the  deforming  object.  More  specifically,  in  the  current  case,  it  is 
very  difficult  to  obtain  samples  from  the  infinite  dimensional  space  of  closed 
curves  (shapes).  This  problem  can  be  solved  using  Bayesian  importance  sam¬ 
pling  [23],  described  briefly  below:  Suppose  p(x)  is  a  probability  density  from 
which  it  is  difficult  to  draw  samples  (but  for  which  p(x)  can  be  evaluated)  and 
q(x)  is  a  density  which  is  easy  to  sample  from  and  has  a  heavier  tail  than 
p(x)  (i.e.  there  exists  a  bounded  region  R  such  that  for  all  points  outside  R , 
q(x)  >  p(x)).  q(x)  is  known  as  the  proposal  density  or  the  importance  density. 
Let  xl  ~  q(x ),  i  =  1 ,  ...,7V  be  samples  generated  from  q(-).  Then,  an  approx¬ 
imation  to  p(-)  is  given  by  p(x)  ~  iWl5(x  —  xl ),  where  uj1  oc  is  the 
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normalized  weight  of  the  i-th  particle.  So,  if  the  samples,  S^\  were  drawn  from 
an  importance  density,  q(St\S1:t-i,Y1:t),  and  weighted  by  u[l)  oc 

then  —  St)  approximates  p(St\Yi:t).  The  choice  of  the  importance 

density  is  a  critical  design  issue  for  implementing  a  successful  particle  filter.  As 
described  in  [24],  the  proposal  distribution  q(-)  should  be  such  that  particles 
generated  by  it,  he  in  the  regions  of  high  observation  likelihood.  Another  im¬ 
portant  requirement  is  that  the  variance  of  the  weights  uj1  should  not  increase 
over  time.  Various  algorithms  have  been  proposed  [24]  to  achieve  this  objective. 
One  way  of  doing  this  is  to  use  an  importance  density  which  depends  on  the 
current  observation.  This  idea  has  been  used  in  many  past  works  such  as  the 
unscented  particle  filter  [25]  where  the  proposal  density  is  a  Gaussian  density 
with  a  mean  that  depends  on  the  current  observation.  In  this  work,  we  propose 
a  possible  importance  density  function  q(St\St-i,Yt)  and  show  how  to  obtain 
samples  from  it.  Note  that,  the  space  of  closed  curves  from  which  we  want  to 
obtain  samples  is  infinite  dimensional. 


3.1  Time  Update 

The  prediction  St  at  time  t  is  given  by:  St  =  ft(St-uH,nt)  where  nt  is  random 
noise  vector,  it  is  any  user  defined  input  data  (in  our  case,  it  is  the  set  of  training 
data)  and  ft  is  possibly  a  nonlinear  function.  The  problem  of  tracking  deforming 
objects  can  be  separated  into  two  parts  [8]: 

1.  Tracking  the  global  rigid  motion  of  the  object; 

2.  Tracking  local  deformations  in  the  shape  of  the  object,  which  can  be  defined 
as  any  departure  from  rigidity. 

Accordingly,  we  assume  that  the  parameters  that  represent  rigid  motion  Tt  and 
the  parameters  that  represent  the  shape  are  independent.  Thus,  it  is  assumed 
that  the  shape  of  an  object  does  not  depend  on  its  location  in  the  image,  but  only 
on  its  previous  shape  and  the  location  of  an  object  in  space  does  not  depend  on 
the  previous  shape.  Hence,  the  prediction  step  consists  of  predicting  the  spatial 
position  of  the  object  using  Tt  =  Tt- 1  +  n\  '  where  n\  ;  is  random  Gaussian 
noise  vector  with  variance  a\.  The  prediction  for  shape  <Pt  is  obtained  as  follows: 

4  =  /'(,</',  .1  +  pM-\]  +  +  ...  +  Pk^-i*  (5) 

where  Po,Pi,  ■■■Pk  are  user  defined  weights  such  that  pi  =  1  and  4-V  •  ’  =  1  • 
are  the  k  nearest  neighbors  of  @t- 1-  The  nearest  neighbors  are  obtained  as 
described  in  Section  2.2.  A  more  generalized  formulation  of  the  prediction  step 
above  can  be  obtained  by  sampling  the  weights  pt  from  a  known  distribution 
(for  example,  an  exponential  distribution)  to  obtain  a  set  of  possible  shapes  and 
then  choosing  the  predicted  shape  from  this  set,  based  on  certain  criteria. 

We  should  note  that,  one  of  the  main  contributions  of  this  paper  is  the  for¬ 
mulation  of  a  scheme  that  allows  to  dynamically  predict  the  shape  of  the  object 
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without  learning  the  sequence  in  which  they  occur  (unlike  the  methods  in  [15, 
16]).  Thus,  the  only  knowledge  required  in  this  prediction  step  is  a  training  set  of 
shapes.  In  particular,  one  does  not  need  to  sample  from  an  infinite-dimensional 
space  of  shapes  (curves)  but  only  from  a  set  containing  the  linear  combination 
of  k  nearest  neighbors  of  $t-i-  This  not  only  reduces  the  search  space  dramat¬ 
ically,  but  also  allows  to  sample  from  a  finite  set  of  possible  shapes.  Once  the 
latest  observation  Yt  is  obtained,  one  can  update  the  prediction  based  on  this 
information  as  explained  in  the  following  section. 

3.2  Measurement  Update 

At  time  £,  for  each  particle  i,  generate  samples  as  described  in  the  prediction 
step  in  (5).  Using  the  image  at  time  t  (It),  a  rigid  transformation  is  applied 
to  each  (in  particular  C^)  by  doing  Lr  iterations  of  gradient  descent  on 
the  image  energy  Eimage  with  respect  to  the  rigid  transformation  parameters  T. 
The  curve  is  then  deformed  by  doing  a  few  (Ld)  iterations  of  gradient  descent 
(“curve  evolution”)  on  the  energy,  E ,  i.e.,  we  generate 

*t(0  =  fkT^t\Yt),  #  =  fS%(&?\Yt)  (6) 

where  Y)  is  given  by  (for  j  =  1, 2, Lr) 

r°  =  r,  r*  =  r>  1  -  rEirnage{rj~\^,  Y),  T  =  rL ",  /£'(*,  Y)  =  T$  (7) 

and  fcii A4’ Y )  is  §iven  by  (for  J  =  2)  •••)  Ld ) 

M°  =  M,  '  n-'V;,/.V  '•>')•  f^Y)=^  (8) 

where  E  =  Eimage  +  /3Eshape .  The  energy  Eiiriage  is  as  defined  in  equation  (3) 
and  E shape  is  defined  by  [3]:  Eshape( $)  =  f^^(x)dx,  where  is  the  contour 
obtained  from  a  linear  combination  of  the  nearest  neighbors  of  with  weights 
obtained  using  LLE  from  equation  (1).  The  corresponding  curve  evolution  equa¬ 
tion  is  given  by 

^  =  3(x)  ||  ||  .  (9) 

This  PDE  tries  to  drive  the  current  contour  shape  ^  towards  the  space  of  possible 
shapes  and  equation  (8)  tries  to  drive  the  current  contour  towards  the  minimizer 
of  energy  E  which  depends  on  the  image  and  shape  information.  The  parameter 
/3  is  user  defined  and  weights  the  shape  information  with  the  image  information. 
The  use  of  LLE  to  provide  shape  information  for  contour  evolution  is  another 
main  contribution  of  this  paper. 

Details  about  equation  (7)  can  be  obtained  from  [17];  and  equation  (8) 
may  be  implemented  by  summing  the  PDE’s  (4)  and  (9).  We  perform  only 
L  (Ld  or  Lr)  iterations  of  gradient  descent  since  we  do  not  want  to  evolve  the 
curve  until  it  reaches  a  minimizer  of  the  energy,  Eimage  (or  E).  Evolving  to  the 
local  minimizer  is  not  desirable  since  the  minimizer  would  be  independent  of 


all  starting  contours  in  its  domain  of  attraction  and  would  only  depend  on  the 
observation,  Yt.  Thus  the  state  at  time  t  would  loose  its  dependence  on  the  state 
at  time  t  —  1  and  this  may  cause  loss  of  track  in  cases  where  the  observation 
is  bad.  In  effect,  choosing  L  to  be  too  large  can  move  all  the  samples  too  close 
to  the  current  observation,  while  a  small  L  may  not  move  the  particles  towards 
the  desired  region.  The  choice  of  L  depends  on  how  much  one  trusts  the  system 
model  versus  the  obtained  measurements.  Note  that,  L  will  of  course  also  depend 
on  the  step-size  of  the  gradient  descent  algorithm  as  well  as  the  type  of  PDE 
used  in  the  curve  evolution  equation. 

For  each  i,  the  sample  thus  obtained  is  drawn  from  the  importance 
density  q(S^\S^211Yt)  =  A where  we  assume  a  Gaussian  fit  for  the 
density  q(.)  centered  at  each  .  We  further  assume  that  the  variance  E  is 
very  small  and  constant  for  all  particles,  i.e.,  q(S^  \S^L211Yt)  =  constant.  We 
should  note  that,  this  methodology,  even  though  sub-optimal  (to  the  best  of 
our  knowledge,  an  optimal  method  to  sample  from  an  infinite  dimensional  space 
of  curves  does  not  exist)  allows  to  obtain  samples  that  he  in  region  of  high 
likelihood.  The  above  mentioned  step  of  doing  gradient  descent  can  also  be 
interpreted  as  an  MCMC  move  step,  where  particles  are  “moved”  to  region  of 
high  likelihood  by  any  available  means,  as  given  in  [24]. 


3.3  Setting  the  Importance  Weights 

In  this  paper,  the  state  process  is  assumed  to  be  Markov,  and  the  observations  are 
conditionally  independent  given  the  current  state  i.e.,  p(Yt\So:t)  =  p(Yt\St)-  This 

- E(St,Yt ) 

The  probability  p(Yt \ St)  is  defined  asp (Yt \ St)  oc  e  atot  .  We  define p(St|St-i)  = 
KTtlTt-i)  P(@t\@t-i)  with 

p(Tt|Tt_i)  oc  e  aT  ,  oc  e  ad  +  ae  ad  (10) 

where  d 2  is  the  (squared)  distance  measure  defined  above  in  (2),  and  1  is  the 
MAP  (maximum  a-posteriori)  estimate  of  the  shape  at  time  t  —  1.  We  should  note 
that,  using  the  MAP  shape  information  available  from  time  t— 1  is  quite  essential, 
since  it  adds  weights  to  particles  which  are  closer  to  the  previous  best  estimate 
than  particles  that  are  far  away.  This  is  quite  useful  in  case  of  occlusion  wherein 
particles  which  look  like  the  previous  best  shape  are  given  higher  probability, 
despite  the  occlusion.  The  parameter  a  is  user  defined. 

Based  on  the  discussion  above,  the  particle  filtering  algorithm  can  be  written 
as  follows: 

•  Use  equation  (5)  to  obtain  Tt,<Pt. 

•  Perform  Lr  steps  of  gradient  descent  on  rigid  parameters  using  (7)  and  L d 
iterations  of  curve  evolution  using  (8). 


gives  the  following  recursion  for  the  weights  [23]:  ~ 


t- 1 
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•  Calculate  the  importance  weights,  normalize  and  resample  [4],  i.e., 


<4°  (X  p(Yt \S?>)p(Ttw |T£X$; 


(i)  lrp(i) 


*&), 


,  ,(0  _ 

LO+  — 


L 0. 


W 


Z),=i  wt' 


(11) 


4  Experiments 

The  proposed  algorithm  was  tested  on  3  different  sequences  and  the  results  are 
presented  in  this  section.  We  certainly  do  not  claim  that  the  method  proposed  in 
this  paper  is  the  best  one  for  every  image  sequence  on  which  it  was  tested,  but 
it  did  give  very  good  results  with  a  small  number  of  particles  on  all  of  the  image 
sequences.  We  should  add  that  to  the  best  of  our  knowledge  this  is  the  first  time 
dynamic  shape  prior  in  a  level  set  framework  has  been  used  in  conjunction  with 
the  particle  filter  [4]  for  tracking  such  deforming  objects. 

In  all  of  the  test  sequences,  we  have  used  the  following  parameters  which 
gave  good  results: 

1.  Choosing  k ,  the  number  of  nearest  neighbors  (for  obtaining  3(x)  in  eqn  (9)): 
k  will  depend  on  the  number  of  similar  shapes  available  in  the  training  set 
[18].  In  our  experiments,  k  =  2  gave  acceptable  results. 

2.  Choosing  crj:  A  classical  choice  [20]  is  o\  =  c  ^  Y^iLi  mm^d2^,  4>j).  For 
all  the  test  sequences,  c  =  1/20  was  used. 

3.  Oj,  models  the  motion  dynamics  of  the  object  being  tracked.  In  all  the  test 
sequences,  since  the  spatial  motion  of  the  object  was  not  large,  we  used 
atp  =  1000.  Also,  only  translational  motion  was  assumed,  i.e.,  T  =  [x  y\. 

4.1  Shark  Sequence 

This  sequence  has  very  low  contrast  (object  boundaries  in  some  images  are 
barely  visible  even  to  human  observers)  with  a  shark  moving  amid  a  lot  of  other 
fish  which  partially  occlude  it  simultaneously  in  many  places.  This  results  in  a 
dramatic  change  in  the  image  statistics  of  the  shark  if  a  fish  from  the  background 
occludes  the  shark.  The  training  set  was  obtained  by  hand  segmenting  10%  of 
the  images  from  the  image  sequence.  Tracking  results  without  shape  information 
using  the  algorithm  in  [12]  is  shown  in  Figure  3.  As  can  be  seen,  even  though 
the  algorithm  tracks  the  shark,  it  is  unable  to  maintain  the  shape.  Results  using 
the  proposed  algorithm  are  shown  in  Figure  3.  This  sequence  demonstrates  the 
robustness  of  the  proposed  algorithm  in  the  presence  of  noise  and  clutter.  The 
following  parameters  were  used  in  tracking  this  sequence:  Lr  =  1,  =  10, 

particles  =  40. 


4.2  Octopus  Sequence: 

As  seen  in  Figure  4,  the  shape  of  the  octopus  undergoes  large  changes  as  it 
moves  in  a  cluttered  environment.  It  gets  occluded  for  several  frames  by  a  fish 
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having  the  same  mean  intensity.  Tracking  this  sequence  using  equation  (4)  or 
any  other  method  without  shape  information  may  result  in  the  curve  leaking  to 
encompass  the  fish.  Figure  4  shows  tracking  results  using  the  proposed  method. 
The  following  set  of  parameters  were  used  in  tracking  this  sequence:  Lr  —  3, 
Ld  =  10,  particles  =  50,  training  set  included  9%  of  possible  shapes. 


4.3  Soccer  Sequence: 

This  sequence  tracks  a  man  playing  soccer.  There  is  large  deformation  in  the 
shape  due  to  movement  of  the  limbs  (hands  and  legs)  as  the  person  tosses  the 
ball  around.  The  deformation  is  also  great  from  one  frame  to  next  when  the 
legs  occlude  each  other  and  separate  out.  There  is  clutter  in  the  background 
which  would  cause  leaks  if  geometric  active  contours  or  the  particle  filtering 
algorithm  given  in  [12]  were  used  to  track  this  sequence  (see  Figure  5).  Results 
of  tracking  using  the  proposed  method  are  shown  in  Figure  5.  The  following  set 
of  parameters  were  used  to  track  this  sequence:  Lr  =  5,  Ld  =  18,  particles  =  50, 
and  20%  of  the  possible  shapes  were  included  in  the  training  set  (see  Figure  1). 

5  Conclusions  and  Limitations 

In  this  paper,  we  have  presented  a  novel  method  which  incorporates  dynamic 
shape  prior  information  into  a  particle  filtering  algorithm  for  tracking  highly 
deformable  objects  in  presence  of  noise  and  clutter.  The  shape  prior  information 
is  obtained  using  Locally  Linear  Embedding  (LLE)  for  shapes.  No  motion  or 
shape  dynamics  are  required  to  be  known  for  tracking  complex  sequences,  i.e., 
no  learning  is  required.  The  only  information  needed  is  a  set  of  shapes  that  can 
appropriately  represent  the  various  deformations  of  the  object  being  tracked. 

Nevertheless,  the  current  algorithm  has  certain  limitations.  First,  it  is  compu¬ 
tationally  very  expensive,  as  each  particle  has  to  be  evolved  for  many  iterations. 
Second,  the  training  set  should  contain  sufficient  number  of  possible  shapes  of 
the  object  being  tracked  so  that  LLE  can  be  used. 
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Fig.  3.  First  row  shows  tracking  results  with  no  shape  information.  Next  two  rows 
show  results  using  the  proposed  algorithm. 


Fig.  4.  Octopus  Sequence:  Results  using  the  proposed  algorithm.  Notice  that  a  fish 
with  the  same  mean  intensity  occludes  the  octopus. 


Fig.  5.  Results  of  tracking  using  the  proposed  method.  Last  image  at  the  bottom  right 
is  the  segmentation  using  equation  (4)  without  any  shape  information. 


